Targeted Gene Metagenomic Data Analysis ◾ 263
If the samples share all the taxonomic groups, the percent Jaccard index will be 100%
similar (β
=100%
jac
); the closer it is to 100%, the more similar the samples are. If the two
samples share no species/taxa, they will be 0% similar (β
= 0
jac
). If the percent Jaccard
index is 50%, the two sample will share half of the taxonomic groups.
7.2.5.2.2 Bray–Curtis Dissimilarity Index
ββ
=
−
+
×
1
2
100
c
a
b
br
(7.14)
The percent Bray–Curtis dissimilarity is always a number between 0 and 100. If it is 0, then
the two samples share all the same species; if it is 100, that means the two samples do not
share any species.
7.2.5.2.3 Unweighted and Weighted UniFrac Distance Index
The UniFrac is phylogenetic-based beta diversity index that takes into account the evolu-
tionary relatedness of the communities in the two samples. The UniFrac distance index
is defined as the fraction of the observed branch lengths of the phylogenetic tree that is
unique in either sample. If the communities of the two samples are identical, UniFrac
index will be zero. If the two communities are evolutionarily unrelated, the UniFrac index
would be 1.0. The UniFrac index is closer to zero if the communities of the two samples are
more evolutionarily related. The weighted UniFrac uses relative abundances of species in
the samples as a weight on the branch lengths (thus, it emphasizes the dominant species).
While unweighted UniFrac uses only presence or absence (thus, it emphasizes the rare
species).
7.3 DATA ANALYSIS WITH QIIME2
Now it is time to get your hand dirty with some worked examples that cover raw data pre-
processing, read clustering, denoising, taxonomic assignment, phylogenetic tree, and diver-
sity analysis. For this purpose, we will use QIIME2 (Quantitative Insights Into Microbial
Ecology 2) [15], which is the most commonly used free program for analysis of amplicon-
based microbial sequencing data. QIIME2 can be used for any analysis of any targeted gene
sequencing data but the program modules for the analysis of metagenomic data based on
16S rRNA gene are very well established. QIIME2 can be installed in different platforms.
For the detailed installation instructions, visit “https://docs.qiime2.org/2022.2/install/”.
If you have Anaconda installed on you Linux computer, you can install it with “conda
install -c qiime2 qiime2”; however, make sure that all requirements are met. We will use
QIIME2 under the Anaconda environment. After installing QIIME2 under Anaconda,
run “conda activate qiime” on the Linux terminal to activate QIIME2 environment. Once
it has been activated, the terminal prompt will change into something like “(qiime2)$”.
Then, you can run any QIIME command. To display the available QIIME2 commands,
run the following:
(qiime2)$ qiime